Feature selection, identifying a subset of variables that are relevant forpredicting a response, is an important and challenging component of manymethods in statistics and machine learning. Feature selection is especiallydifficult and computationally intensive when the number of variables approachesor exceeds the number of samples, as is often the case for many genomicdatasets. Here, we introduce a new approach -- the Bayesian Ising Approximation(BIA) -- to rapidly calculate posterior probabilities for feature relevance inL2 penalized linear regression. In the regime where the regression problem isstrongly regularized by the prior, we show that computing the marginalposterior probabilities for features is equivalent to computing themagnetizations of an Ising model. Using a mean field approximation, we show itis possible to rapidly compute the feature selection path described by theposterior probabilities as a function of the L2 penalty. We present simulationsand analytical results illustrating the accuracy of the BIA on some simpleregression problems. Finally, we demonstrate the applicability of the BIA tohigh dimensional regression by analyzing a gene expression dataset with nearly30,000 features.
展开▼